feat(example): add a kv example based on fjall kv store by ariesdevil · Pull Request #1658 · databendlabs/openraft

ariesdevil · 2026-02-10T14:12:23Z

Add a fjall based kv example. fjall is a lsm kv store that is written in pure Rust. Compared to RocksDB, it can accelerate compilation time.

fjall lack of remove_range but it will impl soon I think.

Checklist

Updated guide with pertinent info (may not always apply).
Squash down commits to one or two logical commits which clearly describe the work you've done.
Unittest is a friend:)

This change is

drmingdrmer

@drmingdrmer reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: 4 of 18 files reviewed, 1 unresolved discussion (waiting on ariesdevil).

examples/raft-kv-fjall/src/log_store.rs line 185 at r1 (raw file):

        for k in start_index..10_000 {
            self.keyspace_logs().remove(id_to_bin(k)).map_err(|e| io::Error::other(e.to_string()))?;
        }

you must not truncate a log from the left boundary if the server crashed it will leave a hole in the logs that is forbidden. This has been documented in the trait method.

Code quote:

        for k in start_index..10_000 {
            self.keyspace_logs().remove(id_to_bin(k)).map_err(|e| io::Error::other(e.to_string()))?;
        }

drmingdrmer · 2026-02-20T03:37:35Z

Thanks for the contribution! A fjall-based storage backend is a great idea — pure Rust, LSM-tree, faster compile times compared to RocksDB. That addresses a real pain point.

One suggestion on scope: rather than a full KV application example (with HTTP API, network layer, main.rs, and test-cluster.sh), would you consider trimming this down to a storage-only example, similar to examples/rocksstore?

rocksstore contains only:

log_store.rs — RaftLogStorage impl
state_machine.rs — RaftStateMachine impl
lib.rs and a unit test wired to the storage test suite

No HTTP server, no CLI binary, no network glue. The storage test suite (Suite::test_all) gives it solid coverage without any of that scaffolding.

The reason: full application examples tend to require more ongoing maintenance (dependency updates, API compatibility, network stack changes), and they duplicate what raft-kv-memstore already demonstrates. A focused storage implementation would be immediately useful to anyone evaluating fjall as a backend, easier to maintain, and a cleaner fit alongside rocksstore.

Happy to help shape it if you want to go that route.

drmingdrmer · 2026-02-20T03:42:18Z

Another consideration worth thinking about: log storage and state machine storage are independent in openraft, and they serve very different roles.

Log storage is on the critical path of consensus — every append, vote, and commit goes through it, so it needs low write latency and sequential append performance. LSM-tree engines like fjall are optimized for write throughput and point lookups, but the compaction overhead and write amplification make them a less natural fit for a Raft log compared to append-oriented storage.

The state machine, on the other hand, is much less latency-sensitive from the consensus perspective. Its only hard requirement for consensus is the ability to produce a snapshot. An LSM-tree engine is actually a good fit here: random reads and writes, key-value lookups, compaction — all align well with typical state machine workloads.

There is already a standalone state machine example at examples/sm-mem that shows this separation in practice.

Have you considered implementing fjall as a RaftStateMachine only, leaving log storage to a more append-friendly backend? That would highlight one of openraft's design strengths (the two traits are fully decoupled), play to fjall's actual strengths, and give users a practical example of mixing storage backends.

marvin-j97 · 2026-02-20T11:09:27Z

Log storage is on the critical path of consensus — every append, vote, and commit goes through it, so it needs low write latency and sequential append performance. LSM-tree engines like fjall are optimized for write throughput and point lookups, but the compaction overhead and write amplification make them a less natural fit for a Raft log compared to append-oriented storage.

If the writes are strictly sequential, fjall does not actually compact anything, it just uses trivial moves. RocksDB should be doing the same. Even then, as long as the write workload is not way too intensive, compactions don't affect latencies that much (especially when your writes are synchronous). The base write amp (without compaction) is ~2x, which is really not that much.

ariesdevil · 2026-02-20T13:28:01Z

Happy to help shape it if you want to go that route.

Sure, thx.

drmingdrmer · 2026-02-21T09:54:17Z

If the writes are strictly sequential, fjall does not actually compact anything, it just uses trivial moves. RocksDB should be doing the same. Even then, as long as the write workload is not way too intensive, compactions don't affect latencies that much (especially when your writes are synchronous). The base write amp (without compaction) is ~2x, which is really not that much.

Fair point on the compaction side — for a strictly sequential append workload, trivial moves do apply and compaction pressure stays low.

But there is a deeper structural issue beyond compaction: fjall has its own WAL (it calls it a "journal"). Looking at the source, it sits in src/journal/ and supports three persist modes: Buffer (OS cache only), SyncData (fdatasync), and SyncAll (fsync). When a write returns from fjall, the data has gone through fjall's journal first, then eventually lands in an SST file.

This creates a durability accounting problem for Raft log storage. Raft requires that once append() signals completion via IOFlushed, the entries are durable. To satisfy that with fjall underneath, you have to fsync through fjall's journal on every append — which means you are paying for two WALs on the write path: fjall's journal and the implicit durability contract of the Raft log itself. You do not get to bypass fjall's journaling to write directly to disk.

For a state machine this does not matter at all, since the state machine can always be rebuilt by replaying the log. Raft never requires the state machine to be durable — only the log must be. That is exactly why fjall looks like a strong fit for RaftStateMachine and a more awkward one for RaftLogStorage.

marvin-j97 · 2026-02-21T14:21:50Z

To satisfy that with fjall underneath, you have to fsync through fjall's journal on every append

You do not get to bypass fjall's journaling to write directly to disk.

What's the alternative? Writing to a file manually has the same fsync costs. But with an LSM you get automatic spilling to SSTs if the journal gets too large, and the WAL is checksummed by default to prevent recovering corrupted data.
Flushing to SSTs happens in the background so it does not impact writers much.

drmingdrmer · 2026-02-21T14:28:38Z

To satisfy that with fjall underneath, you have to fsync through fjall's journal on every append
You do not get to bypass fjall's journaling to write directly to disk.

What's the alternative? Writing to a file manually has the same fsync costs. But with an LSM you get automatic spilling to SSTs if the journal gets too large, and the WAL is checksummed by default to prevent recovering corrupted data. Flushing to SSTs happens in the background so it does not impact writers much.

flushing to sst is unnecessary. this is the point.

More or less, there is an extra disk write burden.

I still want such an example to encourage user application to use most appropriate pure WAL-like storage other than a LSM based storage. But you said that the impact is negligible, so it is okay.

ariesdevil force-pushed the dev_ariesdevil branch 3 times, most recently from 5bf7453 to 49de246 Compare February 10, 2026 14:20

Feature: add a kv example based on fjall kv store

ea0e770

ariesdevil force-pushed the dev_ariesdevil branch from 49de246 to ea0e770 Compare February 10, 2026 14:38

drmingdrmer requested changes Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(example): add a kv example based on fjall kv store#1658

feat(example): add a kv example based on fjall kv store#1658
ariesdevil wants to merge 1 commit intodatabendlabs:mainfrom
ariesdevil:dev_ariesdevil

ariesdevil commented Feb 10, 2026 •

edited

Loading

Uh oh!

drmingdrmer left a comment

Uh oh!

drmingdrmer commented Feb 20, 2026

Uh oh!

drmingdrmer commented Feb 20, 2026

Uh oh!

marvin-j97 commented Feb 20, 2026

Uh oh!

ariesdevil commented Feb 20, 2026

Uh oh!

drmingdrmer commented Feb 21, 2026 •

edited

Loading

Uh oh!

marvin-j97 commented Feb 21, 2026

Uh oh!

drmingdrmer commented Feb 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ariesdevil commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

drmingdrmer commented Feb 20, 2026

Uh oh!

drmingdrmer commented Feb 20, 2026

Uh oh!

marvin-j97 commented Feb 20, 2026

Uh oh!

ariesdevil commented Feb 20, 2026

Uh oh!

drmingdrmer commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marvin-j97 commented Feb 21, 2026

Uh oh!

drmingdrmer commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ariesdevil commented Feb 10, 2026 •

edited

Loading

drmingdrmer commented Feb 21, 2026 •

edited

Loading

drmingdrmer commented Feb 21, 2026 •

edited

Loading